<HTML
><HEAD
><TITLE
>Clones</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.73
"><LINK
REL="HOME"
TITLE="DataparkSearch Engine 4.50"
HREF="index.en.html"><LINK
REL="UP"
TITLE="Indexing"
HREF="dpsearch-indexing.en.html"><LINK
REL="PREVIOUS"
TITLE="Stopwords"
HREF="dpsearch-stopwords.en.html"><LINK
REL="NEXT"
TITLE="Specifying WEB space to be indexed "
HREF="dpsearch-follow.en.html"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="datapark.css"><META
NAME="Description"
CONTENT="DataparkSearch - Full Featured Web site Open Source Search Engine Software over the Internet and Intranet Web Sites Based on SQL Database. It is a Free search software covered by GNU license."><META
NAME="Keywords"
CONTENT="shareware, freeware, download, internet, unix, utilities, search engine, text retrieval, knowledge retrieval, text search, information retrieval, database search, mining, intranet, webserver, index, spider, filesearch, meta, free, open source, full-text, udmsearch, website, find, opensource, search, searching, software, udmsearch, engine, indexing, system, web, ftp, http, cgi, php, SQL, MySQL, database, php3, FreeBSD, Linux, Unix, DataparkSearch, MacOS X, Mac OS X, Windows, 2000, NT, 95, 98, GNU, GPL, url, grabbing"><SCRIPT
SRC="http://www.google-analytics.com/urchin.js"
TYPE="text/javascript"></SCRIPT><SCRIPT
SRC="http://www.dataparksearch.org/ga.js"
TYPE="text/javascript"></SCRIPT></HEAD
><BODY
CLASS="sect1"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000C4"
VLINK="#1200B2"
ALINK="#C40000"
><!--#include virtual="body-before.html"--><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>DataparkSearch Engine 4.50: Reference manual</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="dpsearch-stopwords.en.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
>Chapter 3. Indexing</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="dpsearch-follow.en.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="sect1"
><H1
CLASS="sect1"
><A
NAME="clones"
>3.5. Clones</A
></H1
><A
NAME="AEN765"
></A
><P
><TT
CLASS="literal"
>Clones</TT
> -- are documents having equal values of Hash32 on all document sections. Indentical copies of the same
document always have equal values of Hash32. This allow to eliminate duplicate documents in a collection.
However, if only <TT
CLASS="literal"
>title</TT
> section is defined in <TT
CLASS="filename"
>sections.conf</TT
>, all documents with different bodies but
with identical titles will be considered as clones.
</P
><DIV
CLASS="sect2"
><H2
CLASS="sect2"
><A
NAME="detectclones_cmd"
>3.5.1. <B
CLASS="command"
>DetectClones</B
> command</A
></H2
><A
NAME="AEN774"
></A
><PRE
CLASS="programlisting"
>&#13;DetectClones yes/no
</PRE
><P
>&#13;Allow/disallow clone detection and eliminating. If allowed, indexer will 
detect the same documents under different location, such as
mirrors, and will index only one document from the group of
such equal documents. "DetectClones yes" also allows to reduce space usage.
Default value is "yes".
<PRE
CLASS="programlisting"
>&#13;DetectClones no
</PRE
>
</P
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="dpsearch-stopwords.en.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.en.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="dpsearch-follow.en.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Stopwords</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="dpsearch-indexing.en.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Specifying WEB space to be indexed</TD
></TR
></TABLE
></DIV
><!--#include virtual="body-after.html"--></BODY
></HTML
>